A New Framework to Deal with OOV Words in SLT System
نویسندگان
چکیده
Automatic spoken language translation (SLT) is considered as one of the most challenging tasks in modern computer science and technology. It is always a hard nut to deal with the problem of Out-Of-Vocabulary (OOV) words in SLT. The existing traditional SLT framework often doesn’t take effect for OOV words translation because of the data sparseness. In this paper based on the analysis of common OOV expressions appeared in SLT, we propose a new framework for bidirectional Chinese-English SLT in which a series of approaches to translating OOV expressions are presented. The experimental results have shown that our framework and approaches are effective and can greatly improve the translation performance.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملLearning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words
We present a general method for incorporating an “expert” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resource...
متن کاملLearning Out-of-Vocabulary Words in Automatic Speech Recognition
Out-of-vocabulary (OOV) words are unknown words that appear in the testing speech but not in the recognition vocabulary. They are usually important content words such as names and locations which contain information crucial to the success of many speech recognition tasks. However, most speech recognition systems are closed-vocabulary recognizers that only recognize words in a fixed finite vocab...
متن کاملMulti Class-based n-gram Language Model for New Words Using Web Data
Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. Language models (LMs) for most of open vocabulary ASR systems treat OOV words as one entity, ignoring the linguistic information. In this paper we pres...
متن کاملMonolingual Distributional Profiles for Word Substitution in Machine Translation
Out-of-vocabulary (OOV) words present a significant challenge for Machine Translation. For low-resource languages, limited training data further increases the frequency of OOV words and degrades the quality of the translations. Past approaches have suggested using stems or synonyms for OOV words. Unlike the previous methods, we propose handling not just the OOV words but rare words as well in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011